final version
the equations and the iterations converge when the well-posedness condition is satisfied as we mention in line 199
We thank all reviewers for the comments and the following response will be reflected in the final version. In fact, the convergence is exponential both in theory and in practice. Duchi et al. (2008) has proposed an Thus we focus on the comparison in the graph classification task. More experiments will be added. Global methods like Geom-GCN employ additional embedding approaches to capture global information.
Reviewer # 1 1 Q1: the claim that the algorithm really manages to align the latent distributions of real and simulated data
Q1: ...the claim that the algorithm really manages to align the latent distributions of real and simulated data... We will revise the inappropriate statements in the final version. Q2: In the model adaptation phase, are state-action pairs simply sampled randomly from their respective buffers? Do you have results for a single, monolithic model? Q4: Did you investigate the reasons for the slow learning in the 500 steps on InvertedPendulum compared to PETS? Q1: The experiments shown in Figure 2 do not outperform MBPO beyond the confidence bounds.
093b60fd0557804c8ba0cbf1453da22f-AuthorFeedback.pdf
To Reviewers, we will make all suggested minor corrections in the final version and address main concerns below. This provides new perspectives to acceleration. In terms of experiments, SVR-ADA is compared with SOT A finite sum solvers. If we use one-norm, then it can only represent the general convex setting. In the final version, we will rewrite the abstract to make it more clear.
M-flows
We thank the reviewers for their insightful feedback! Our goal is not to reduce the dimensionality further below n . What are the convergence properties of the proposed training method (R4)? Is the sequential or alternating training scheme better (R4)? It would be nice to have a different metric to compare the models (R1).
solid [ R1, R3, R4 ], our experimental results valuable [ R2, R3, R4] and our paper well-written [ R1, R3, R4]
We only included a single environment (Pusher-v2) in the main paper in order to save space. We will include the suggested references into the paper. See also About multi-step rollouts . The reviewer suggests that the paper should first "show that minimizing the TD-error is not Notice, however, that despite being commonly used and thought of as "intuitive", Furthermore, Figure 1 shows indeed that minimizing the TD-error can lead to a critic being far away from the ideal one. We did not write that "model-based RL has no advantage in terms of sample-efficiency than model-free RL".